Multi-Modal Cognitive States: Augmenting the State in Cognitive Architectures
نویسنده
چکیده
Different streams of AI idealize different aspects of human cognition. Idealization of intelligence as an embodied activity, involving an integration of cognition, perception and the body, places the tightest constraints on the design space for AI artifacts, forcing AI to deeply understand the design tradeoffs and tricks that biology has developed. I propose that a step in the design of such artifacts is to broaden the notion of cognitive state from the current linguistic-symbolic, Language-of-Thought framework to a multi-modal one, where perception and kinesthetic modalities participate in thinking. This is in contrast to the roles assigned to perception and motor activities as modules external to central cognition in the currently dominant theories in AI and Cognitive Science. I develop the outlines of this proposal, and describe the implementation of a bimodal version in which a diagrammatic representation component is added to the cognitive state. AI and Idealization of Human Cognition Before I present my proposal on the multi-modal cognitive state, I wish to digress a little by discussing possible relations between AI system design and cognitive science, or as I prefer to reword it, between AI and human or biological cognition. There is always a connection, since all of AI is based idealization of on one or other aspect of biological cognition. Different AI approaches idealize different aspects of human cognition as the characteristic idealization of intelligence. The logic approach to AI is based on a model of cognition as reasoning, idealizing the linguistic/logical aspect of deliberative thought. Proposed alternatives to logic, such as frames and scripts, use a different idealization, viz., cognition is what memory does. Newell and Simon’s means-ends methods of problem solving and their successor, the Soar architecture (Newell, 1990), are modeled after human deliberative problem solving and learning. In a different vein, Brooks’ robotics work (Brooks, 1986) is based on a model of human physical behavior as arising less from reasoning and problem solving and more from a specific kind of hierarchical organization of a behavioral repertoire. The different idealizations are helpful in building AI systems of specific types. For building knowledge-based systems, or a system to prove theorems in Group Theory, perhaps modeling intelligence as goal-directed logical or quasi-logical reasoning is sufficient. It makes no sense to build it using Brooks’ subsumption architecture. Consider, on the other hand, the design of a coffee-making robot: a robot that would, among other things, make coffee and bring it up in response to a voice command in English; and, if in the process it finds that the refrigerator is out of milk, would drive to the supermarket, buy milk and finish coffee making. The most rigid constraints on models of cognition are placed by the coffee-making, spoken-languageunderstanding robot mentioned above. Unlike Group Theory theorem provers, this robot would have to coordinate in real time perception and motor action with language understanding and problem solving. It would benefit from having memories that contain perceptual representations, such as mental maps, in addition to traditional, linguistically represented knowledge. Unlike Brook’s robots, this robot would have to understand language, reason, and learn at various levels. Further, the fact that everything the robot does is with respect to achieving goals in the world means that all its reasoning is subject to whether something will be good enough for the task at hand, rather than whether the solution is “correct” with respect to the abstract version of the problem. Almost all the issues that have been raised as objections or extensions to GOFAI arise naturally in this context: situatedness, use of the external world as representation, active perception, and embodiment are just some examples. On the other hand, impressive as such a robot might be as an AI achievement, in the task proving theorems in Group Theory, it might still fall short of the achievement of theorem provers that have been built solely for that task domain. The more general and flexible we want the AI system to be, the more numerous are the constraints placed on the proper idealization of cognition. But there is no sharp and complete characterization of the idealization of cognition. Should robots scratch their heads when they are thinking and feel a bit lost? Maybe scratching the head plays an important role in social aspects of cognition, which in turn might affect what they learn and how. While no one in AI today would seriously argue for a head-scratching robot as a better design, there is also no sharp dividing line between where cognition starts and perception and motor processes end. Cognition, Architecture, Embodiment and Multimodality of Thought Generality and flexibility are hallmarks of intelligence, and this has led to a search for cognitive architectures, exemplified by Soar (Newell, 1990) and ACT-R (Anderson, 1996). Different task-specific cognitive systems may be programmed or modeled by encoding domainand task-specific knowledge in the architecture. These architectures are also based on idealizations of biological intelligence. Abstracting from human cognition, they typically posit a working memory (WM), a long term memory (LTM), mechanisms to retrieve from LTM and place in WM information relevant to the task, mechanisms that help the agent set up and explore a problem space, and mechanisms that enable the agent to learn from experience. The specific mechanisms proposed and the representational formalisms on which they work together constitute the architecture designer’s theory of cognition. Because of their origin from an idealization of human cognition, it is not surprising that Soar and ACT-R are useful both to build AI agents as well to build cognitive models. An important aspect of their representational commitment is that the cognitive state, roughly characterized as the content of the WM, is symbolic, or to use a more precise term, predicate-symbolic. That is, the knowledge in LTM as well as representations in WM are compositions of symbol strings where the symbols stand for individuals, relations between individuals, or various ways of composing relational predicates, in some domain of interest. For example, in a blocks world, a state representation might be ON(A,B) & Left(B,C). In this the designs of these architectures share the commitment to symbolic cognitive state representation with almost all of AI (knowledge representation) and Cognitive Science (the Language of Thought hypothesis). Our coffee-making robot would need not only cognition as these architectures model, but they would also need to perceive and perform motor activities. The relationship of cognitive architecture as currently conceived to perception and motor systems is given in Figure 1. Together, the boxes on the right within the gray polygon correspond to an architecture such as Soar or Act-R. The perception modules supply information about the world in the predicate-symbolic form, and the Action module executes an action specification in the predicate symbolic form, such as Move(A, Table), produced by cognition. The perception and action modules are of course essential in this way for the agent to work in the world, but they don’t do any “thinking.” That is performed by cognition using predicate-symbolic representations. Fig. 1. In the current frameworks, Perception and Action are modules external to Cognition. They do not participate in
منابع مشابه
The Evaluation of Environmental Quality criteria in Urban Design Using Citizens' Cognitive Characteristics; (Case Study: Tehran Neighborhoods)
The problem which has been the focus of city constructors and architectures since the beginning of citizenship life is the issue of the nature of environmental quality. This study, with a review of explaining paradigms of environmental quality and by reliance on basic works, has assumed the question of nature of environmental quality an interdisciplinary concept (civic construction, sociology, ...
متن کاملPublic Transport Ontology for Passenger Information Retrieval
Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...
متن کاملargue we need better evaluation and better representations to improve cognitive architectures for HRI
These questions can be reformulated into explicit research objects: a) real-time situation assessment that builds on semantic mapping and supports perspective-taking and affordances analysis, b) interleaved geometric, temporal and symbolic reasoning that supports in particular identification of situations and actions, c) management and exploitation of independent, possibly contradictory, belief...
متن کاملNeuro-ACT Cognitive Architecture Applications in Modeling Driver’s Steering Behavior in Turns
Cognitive Architectures (CAs) are the core of artificial cognitive systems. A CA is supposed to specify the human brain at a level of abstraction suitable for explaining how it achieves the functions of the mind. Over the years a number of distinct CAs have been proposed by different authors and their limitations and potentials were investigated. These CAs are usually classified as symbolic and...
متن کاملInferring Cognitive States from Multimodal Measures in Information Science
This paper presents our work using uniand multi-modal measures to infer and predict context and mental states of a person engaged in interactive information seeking. We describe our use of measures derived from eye-movement patterns in combination with interaction logs to infer dynamic user states (such as cognitive load), user context (such as task characteristics), and persistent user charact...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006